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1 Session S8.1: power and battery management: Process cruise control: 85% 
Q event-driven clock scaling for dynamic power management 
Andreas Weissel , Frank Bellosa 

Proceedings of the international conference on Compilers, architecture, and 

synthesis for embedded systems October 2002 

Scalability of the core frequency is a common feature of low-power processor 
architectures. Many heuristics for frequency scaling were proposed in the past to find 
the best trade-off between energy efficiency and computational performance. With 
complex applications exhibiting unpredictable behavior these heuristics cannot reliably 
adjust the operation point of the hardware because they do not know where the 
energy is spent and why the performance is lost. Embedded hardware monitors in the 
form of ... 



2 Modeling, simulation, sensitivity analysis, and optimization of hybrid 84% 

[J systems 

Paul I. Barton , Cha Kun Lee 

ACM Transactions on Modeling and Computer Simulation (TOMACS) October 2002 
Volume 12 Issue 4 

Hybrid (discrete/ continuous) systems exhibit both discrete state and continuous state 
dynamics which interact to such a significant extent that they cannot be decoupled and 
must be analyzed simultaneously. We present an overview of the work that has been 
done in the modeling, simulation, sensitivity analysis, and optimization of hybrid 
systems, paying particular attention to the interaction between discrete and 
continuous dynamics. A concise intuitive framework for hybrid system modeling is 
pres ... 

3 System-level power optimization: techniques and tools 83% 

Luca Benini , Giovanni de Micheli 
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ACM Transactions on Design Automation of Electronic Systems (TODAES) April 
2000 

Volume 5 Issue 2 

This tutorial surveys design methods for energy-efficient system-level design. We 
consider electronic sytems consisting of a hardware platform and software layers. We 
consider the three major constituents of hardware that consume energy, namely 
computation, communication, and storage units, and we review methods of reducing 
their energy consumption. We also study models for analyzing the energy cost of 
software, and methods for energy-efficient software design and compilation. This 
survery ... 



4 System support for automatic profiling and optimization 

Xiaolan Zhang , Zheng Wang , Nicholas Gloy , J. Bradley Chen , Michael D. Smith 
ACM SIGOPS Operating Systems Review , Proceedings of the sixteenth ACM 
symposium on Operating systems principles October 1997 
Volume 31 Issue 5 




5 Continuous program optimization: A case study 82% 
Thomas Kistler , Michael Franz 

ACM Transactions on Programming Languages and Systems (TOPLAS) July 2003 
Volume 25 Issue 4 

Much of the software in everyday operation is not making optimal use of the hardware 
on which it actually runs. Among the reasons for this discrepancy are 
hardware/software mismatches, modularization overheads introduced by software 
engineering considerations, and the inability of systems to adapt to users' behaviors. A 
solution to these problems is to delay code generation until load time. This is the 
earliest point at which a piece of software can be fine-tuned to the actual capabilities 
of the ... 



6 Performance evaluation of dynamic routing based on the use of 
3) satellites and intelligent networks 

L. Bella , F. Chummun , M. Conte , G. Fischer , J. Rammer 
Wireless Networks February 1998 
Volume 4 Issue 2 

A dynamic routing scheme for public switched telephone networks is introduced which 
employs satellite broadcast to distribute network load data. The proposed network 
architecture closely resembles the IN (Intelligent Network) architecture, whereby the 
IN SCPs (Service Control Points) serve as so-called Routing Control Points (RCPs). The 
key functions of an RCP are (i) to execute the routing algorithm and issue routing 
instructions in response to routing queries it receives from its associat ... 



Application domains for fixed-length block structured architectures 80 °/° 
Lieven Eeckhout , Tom Vander Aa , Bart Goeman , Hans Vandierendonck , Rudy 
Lauwereins , Koen De Bosschere 

Australian Computer Science Communications , Proceedings of the 6th 
Australasian conference on Computer systems architecture January 2001 
Volume 23 Issue 4 

In order to tackle the growing complexity and interconnects problem in modern 
microprocessor architectures, computer architects have come up with new 
architectural paradigms. A fixed-length block structured architecture (BSA) is one of 
these paradigms. The basic idea of a BSA is to generate blocks of instructions, called 
BSA-blocks, statically (by the compiler) and executing these blocks on a decentralized 
microarchitecture. In this paper, we focus on possible application domains for this 
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8 Multimedia and graphics: Enhancing loop buffering of media and 
2) telecommunications applications using low-overhead predication 

John W. Sias , Hillery C. Hunter , Wen-mei W. Hwu 

Proceedings of the 34th annual ACM/IEEE international symposium on 

Microarchitecture December 2001 

Media- and telecommunications-focused processors, increasingly designed as deeply 
pipelined, statically-scheduled VLIWs, rely on loop buffers for low-overhead execution 
of simple loops. Key loops containing control flow pose a substantial problem— full 
predication has a high encoding overhead, and partial predication techniques do not 
support if-conversion, the transformation of general acyclic control flow into predicated 
blocks. Using a set of significant media processing benchmarks, drawn fr ... 



9 Multimedia and graphics: Saving energy with architectural and 

frequency adaptations for multimedia applications 

Christopher J. Hughes , Jayanth Srinivasan , Sarita V. Adve 

Proceedings of the 34th annual ACM/IEEE international symposium on 

Microarchitecture December 2001 

General-purpose processors are expected to be increasingly employed for multimedia 
workloads on systems where reducing energy consumption is an important goal. 
Researchers have proposed the use of two forms of hardware adaptation - 
architectural adaptation and dynamic voltage (and frequency) scaling or DVS - to 
reduce energy. This paper develops and evaluates an integrated algorithm to control 
both architectural adaptation and DVS targeted to multimedia applications. It also 
examines the interac ... 



10 Session 6B: Convergence of abstractions in high-level synthesis: 
0| Application-driven processor design exploration for power-performance 
trade-off analysis 

Diana Marculescu , Anoop Iyer 

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided 

design November 2001 

This paper presents an efficient design exploration environment for high-end core 
processors. The heart of the proposed design exploration framework is a two-level 
simulation engine that combines detailed simulation for critical portions of the code 
with fast profiling for the rest. Our two-level simulation methodology relies on the 
inherent clustered structure of application programs and is completely general and 
applicable to any microarchitectural power/performance simulation engine. The 
prop ... 



11 Modelling and performance evaluation of mobile multimedia systems 

2] using QoS-GSPN 
Tony Tsang 

Wireless Networks November 2003 
Volume 9 Issue 6 

Quality of Service (QoS) measurement of multimedia applications is one of the most 
important issues for call handoff and call admission control in mobile networks. Based 
on the QoS measures, we propose a Generalized Stochastic Petri Net (GSPN) based 
model, called QoS-GSPN, which can express the real-time behavior of QoS 
measurement for mobile networks. QoS-GSPN performance analysis methodology 
includes the formal expression and performance analysis environment. It offers the 
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promise of providing ... 

12 Profile-Based Dynamic Voltage Scheduling Using Program Checkpoints 80 % 
□h A. Azevedo , I. Issenin , R. Cornea , R. Gupta , N. Dutt , A. Veidenbaum , A. Nicolau 

— Proceedings of the conference on Design, automation and test in Europe March 
2002 

Dynamic voltage scaling (DVS) is a known effectivemechanism for reducing CPU 
energy consumption withoutsignificant performance degradation. While a lot of 
workhas been done on inter-task scheduling algorithms to implementDVS under 
operating system control, new researchchallenges exist in intra-task DVS techniques 
under softwareand compiler control. In this paper we introduce anovel intra-task DVS 
technique under compiler control usingprogram checkpoints. Checkpoints are 
generated atcompile time ... 

13 Integrated program measurement and documentation tools 80% 
□ft Anne Schroeder 

— Proceedings of the 7th international conference on Software engineering March 
1984 

This paper describes an attempt to integrate the collection and the efficient utilisation 
of measurements in the development and the use of programs. The work presented 
consists in three parts: - the design of both static and dynamic measurement tools, - 
examples of data processing on measurements collected on a sample of Pascal 
programs, - the design of a quantitative documentation of a program, which is 
automatically built as measurements are collected. 

14 Characterizations of parallelism in applications and their use in 80% 



K. C Sevcik 

ACM SIG METRICS Performance Evaluation Review , Proceedings of the 1989 ACM 
SIG METRICS international conference on Measurement and modeling of 
computer systems April 1989 
Volume 17 Issue 1 

As multiprocessors with large numbers of processors become more prevalent, we face 
the task of developing scheduling algorithms for the multiprogrammed use of such 
machines. The scheduling decisions must take into account the number of processors 
available, the overall system load, and the ability of each application awaiting 
activation to make use of a given number of processors. The parallelism within an 
application can be characterized at a number of different levels of detail ... 

15 Compilation and run-time systems: Vacuum packing: extracting 80% 

hardware-detected program phases for post-link optimization 
Ronald D. Barnes , Erik M. Nystrom , Matthew C. Merten , Wen-mei W. Hwu 
Proceedings of the 35th annual ACM/IEEE international symposium on 
Microarchitecture November 2002 

This paper presents Vacuum Packing, a new approach to profile-based program 
optimization. Instead of using traditional aggregate or summarized execution profile 
weights, this approach uses a transparent hardware profiler to automatically detect 
execution phases and record branch profile information for each new phase. The code 
extraction algorithm then produces code packages that are specially formed for their 
corresponding phases. The algorithm compensates for the incomplete and often 
incoheren ... 
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16 SIGSAM BULLETIN: Computer algebra in the life sciences 80% 

Michael P. Barnett 
L,J ACM SIGSAM Bulletin December 2002 

Volume 36 Issue 4 

This note (1) provides references to recent work that applies computer algebra (CA) to 
the life sciences, (2) cites literature that explains the biological background of each 
application, (3) states the mathematical methods that are used, (4) mentions the 
benefits of CA, and (5) suggests some topics for future work. 



17 Application-adaptive intelligent cache memory system 
Jung-Hoon Lee , Shin-Dug Kim , Charles Weems 

ACM Transactions on Embedded Computing Systems (TECS) November 2002 
Volume 1 Issue 1 

This article presents the design of a simple hardware-controlled, high performance 
cache system. The design supports fast access time, optimal utilization of temporal 
and spatial localities adaptive to given applications, and a simple dynamic fetching 
mechanism with different fetch sizes. Support for dynamically varying the fetch size 
makes the cache equally effective for general-purpose as well as multimedia 
applications. Our cache organization and operational mechanism are especially 
designed ... 



18 Session 5B: mobile software agents: Just-in-time information sharing 80% 

[J architectures in multiagent systems 

Jonathan Carter , AM A. Ghorbani , Stephen Marsh 

Proceedings of the first international joint conference on Autonomous agents 

and multiagent systems: part 2 July 2002 

ACORN (Agent-based Community Oriented Routing Network) is a distributed multi- 
agent architecture for the search, distribution and management of information across 
networks. ACORN utilises the concept of 'information as agent' together with an 
application of Stanley Milgram's Small World Problem (the idea of the Six Degrees of 
Separation) in order to route individual items of information around a network of 
people and agents. This paper describes additions made to the ACORN architecture 
and the i ... 



19 Task assignment with unknown duration 80% 

[3j Journal of the ACM (JACM) March 2002 
Volume 49 Issue 2 

We consider a distributed server system and ask which policy should be used for 
assigning jobs (tasks) to hosts. In our server, jobs are not preemptible. Also, the job's 
service demand is not known a priori. We are particularly concerned with the case 
where the workload is heavy-tailed, as is characteristic of many empirically measured 
computer workloads. We analyze several natural task assignment policies and propose 
a new one TAGS (Task Assignment based on Guessing Size). The TAG ... 



20 High-quality operation binding for clustered VLIW datapaths 80% 

□ft Viktor S. Lapinskii , Margarida F. Jacome , Gustavo A. de Veciana 

— Proceedings of the 38th conference on Design automation June 2001 

Clustering is an effective method to increase the available parallelism in VLIW 
datapaths without incurring severe penalties associated with large number of register 
file ports. Efficient utilization of a clustered datapath requires careful binding of 
operations to clusters. The paper proposes a binding algorithm that effectively 
explores tradeoffs between in-cluster operation serialization and delays associated 
with data transfers between clusters. Extensive experimental evidence is provid ... 
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1 Dynamic trace selection using performance monitoring hardware 
sampling 

Chen, H.; Wei-Chung Hsu; Jiwei Lu; Pen-Chung Yew; Dong-Yuan Chen; 
Code Generation and Optimization, 2003. CGO 2003. International Symposium 
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Pages: 79 - 90 
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2 TEST: a Tracer for Extracting Speculative Threads 
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fAbstractl fPDF Full-Text (613 KB)1 ieee cnf 

3 Vacuum packing: extracting hardware-detected program phases for 
post-link optimization 

Barnes, R.D.; Nystrom, E.M.; Merten, M.C.; Hwu, W.W.; 
Microarchitecture, 2002. (MICRO-35). Proceedings. 35th Annual IEEE/ACM 
International Symposium on , 18-22 Nov. 2002 
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